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Abstract 

We review the use of Bayesian Model Averaging in astrophysics. We first introduce the statistical 
^ , basis of Bayesian Model Selection and Model Averaging. We discuss methods to calculate the model- 

■ averaged posteriors, including Markov Chain Monte Carlo (MCMC), nested sampling, Population 

Monte Carlo, and Reversible Jump MCMC. We then review some applications of Bayesian Model 
Averaging in astrophysics, including measurements of the dark energy and primordial power spectrum 
parameters in cosmology, cluster weak lensing and Sunyaev-Zel'dovich effect data, estimating distances 
to Cepheids, and classifying variable stars. 
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1 Introduction 



We are in an unparalleled time in modern astrophysics, a belle epoque where more datasets and theoretical 
models are available than ever before, with the number increasing at an exponential rate. This is driven 
by Moore's law, as new technology has allowed the development of better detectors to make observations 
and more powerful computers to produce simulations. 
■ In the middle, acting as the interface between these two areas, is the field of statistical analysis, and 

I again the growth in available processing power has had a remarkable impact. Before the last decade, 

Bayesian statistics were rarely applied in astrophysics, due to the computationally intensive nature of 
' computing the relevant quantities accurately. Now posterior samples and model selection statistics can 

. be computed fast enough that the space of possible theories can be fully investigated. Comprehensive 

CO ' overviews of Bayesian methodology are given by MacKay, Gregory, and Sivia & Skilling O dj [3] ; for a 

collection of articles on such methods as applied to cosmology see ref 2j . 

The strength of Bayesian methods is the ability to assign a probability value to a model directly, based 
on the parameter ranges allowed and the data available. Model selection should proceed first, before 
parameter estimation, and only once the best model has been found should the parameters be estimated. 
Often though, the data is not strong enough to distinguish decisively between the models. In this case, 
computing the parameter constraints under the assumption of an individual model underestimate the true 
uncertainty of those parameters. 

The solution to this problem is Bayesian Model Averaging, where the uncertainty as to the correct 
model is folded into the calculation of the parameter probabilities. Each individual posterior, generated 
under the assumption of a particular model, is weighted by the model likelihood and then combined with 
the others to give the model-independent posterior. In this paper we review Bayesian Model Averaging 
and its previous applications in astrophysics and cosmology. 

In our review we found that Bayesian Model Averaging is used in two manners. Firstly there are 
those situtations where the likelihood is well understood and it is only the nature of the prior models 
and parameters that are varied and averaged over. This is typically the case in cosmology, where the 
cosmic microwave background (CMB) and distance rulers are well measured observablcs. However, these 
observations indicate the existence of new physical processes, such as inflation and dark energy, for which 
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there arc no well-established physical mechanisms. Then there are the cases where the physical model being 
investigated is well understood, but the model of data acquisition or interpretation is more ambiguous, 
leading to nuisance models that need to be averaged over. This is the case for galaxy clusters, Cepheid 
variable distances, and star classification. Of course this kind of division can be somewhat arbitrary, as 
the models considered as nuisance for one investigator maybe of great interest for another. 

In Section [2] we introduce the ideas of Bayesian Model Selection and Model Averaging. In Section [3] we 
discuss some of the methods used to compute the model-averaged posterior distributions, while in Section 
m we review the applications of Bayesian Model Averaging that exists in the astrophysics literature. In 
Section [5] we conclude. 



2 Model selection and model averaging 

Bayes' theorem states the relationship between models parameters {6), and data {D) 

where P{9\M.) is the prior probability distribution of the parameters (assuming some model TM), P{D\9, 7W) 
is the likelihood, and P{6\D,J\A) is the posterior probability distribution of the parameters. The prior is 
updated to the posterior by the likelihood. The term P{D\M.) represents the model likelihood, and is 
called the evidence. In the case of single-model inference (where only a single model or set of parameters is 
considered), the evidence is simply a normalizing constant, set to satisfy the condition that the posterior 
distribution sums to unity. 

However, in most interesting cases in astrophysics, the correct model is not known, and the evidence 
takes different values for different models. We can use Bayes' theorem again, at one level above, to 
calculate the posterior odds between different models and perform model selection^ 

P{M,\D) ^ P{D\M,)P{M,) 

P{M2\D) P{D\M2)P{M2) ■ ^ ' 

Here Mi and M.2 are the different models under consideration, P{M.i) gives the prior probability of 
each model, and P{M.i\D) is the model posterior probability. Thus the model posterior probability is 
updated from the prior by the evidence. If the model priors are equal then the ratio of posteriors is 
simply the ratio of evidences. The ratio of evidences is commonly known as the Bayes factor B [S|, and an 
interpretation scale was suggested by |B] (though some authors have started to use different language to 
qualify the different levels, e.g. [7])- Many papers have already been written about the use of the evidence 
for cosmological model selection, a collection of the earliest being [51 UHl [HI 1121 [131 [HI [IS] • For reviews 
see refs [H [13 [H] . 

The logical procedure would be to first perform a model selection analysis to find the best model. 
Having done so, we would then perform parameter inference for the parameters of that single best model. 
However it is possible, even likely, that no model will have decisive evidence (In B > 5) over all competing 
models. If we want to include this model uncertainty in the parameter posteriors, we can instead produce 
a model- averaged posterior distribution [Hj, where the individual posteriors from each model are summed 
together, weighted by the model posterior values: 

This model-averaged posterior encodes the uncertainty as to the correct model. 



3 Methods 

Bayesian Model Averaging requires the computation of two quantities per model, the individual model 
posterior and the evidence. Astrophysical data are complex, requiring (at the very least) differential equa- 
tions to be integrated over time and/or space to make predictions that can be tested against observations. 
As such, almost no likelihood function can be solved for analytically, and so the posterior and evidence 
have to be estimated by sampling methods. 
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Since such Bayesian methods, especiahy model selection and model averaging, have only recently 
become popular in their application in astrophysics, there are quite a number of methods available in the 
statistics literature that remain unconsidered in this review. For overview of some other methods see, for 
example ref [5D]. 

3.1 Markov Chain Monte Carlo 

Markov Chain Monte Carlo is a sampling method whose objective is to generate a chain of sampled 
parameter values that obeys Markovian statistics, and has the distribution being sampled as its equilibrium 
distribution. Estimates of the underlying moments of the probability distribution are simply estimates of 
the moments of the chain, for example the mean ^ and variance given by 

1 " 

^,^£(6) = (4) 

i—l 
1 " 

1=1 

Similarly, we could use the chain to estimate the evidence, 

1 " 

P{D\M)^Ec^-y^L{9,)Pr{e,), (6) 

i=i 

where L{9) is the likelihood, P{D\6, AI), and Pt{9) is the prior, P{d\D), but this does not provide a valid 
(either unbiased or converging) approximation. 

Since samples cannot be drawn directly from the distribution initially, MCMC conducts a random 
walk through the available parameter space, generating points and adding them to the chain based on a 
set of rules. The most common form of MCMC used in astrophysics is the Metropolis-Hastings algorithm 

[2I1I22]- 

MCMC has been used to generate posterior samples for a wide variety of problems in astrophysics, 
becoming prevalent in cosmology following the creation of the CosmoMC code by Lewis and Bridle |23| . 
MCMC methods are very efhcient at tracing uni-modal posterior probability distributions without pro- 
nounced degeneracies in the vicinity of the best-fit region. However, the evidence calculation receives 
non-negligible contributions from the tails of the distribution, because even though the likelihoods there 
are small, this region occupies a large volume of the prior probability space. To compute the evidence 
accurately, the entire prior space has to be sampled, which can take a very long time using simple MCMC 
methods. 



3.2 Nested sampling 

The nested sampling algorithm was devised by Skilling [35] as a method to compute the evidence. 
Nested sampling is a Monte Carlo method that recasts the multi-dimensional evidence integral as a one- 
dimensional integral in terms of the prior mass X, where dX = PT{6)d6, with X running from to 1. The 
integral is simply the sum of the likelihood- weighted prior mass samples, 

E = I L{X)dX = ^ , E, = -fix,-, + X,+,) . (7) 

This is shown in Figure [T] 

In order to compute the integral accurately the prior mass is logarithmically sampled. Firstly a set of 
N 'live' points is randomly distributed within the prior parameter space. Then the lowest likelihood point 
Lj is discarded, replaced by a new point uniformly sampled from the remaining prior mass (i.e. it must 
have a likelihood greater than Lj). At every iteration the remaining prior mass, Xj, shrinks by a known 
probabilistic factor that can be estimated from the number of live points, and the evidence is incremented 
accordingly. In this way the algorithm works its way towards the higher likelihood regions. 

Nested sampling was adapted for use in cosmology by Mukherjee, et al. [26] . creating the CosmoNest 
code, and by [27j . with MultiNest. The key challenge of applying nested sampling is the uniform sampling 
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Figure 1: A schematic of the nested samphng algorithm. The two-dimensional parameter space is shown 
at the top right. The points within it are considered to represent contours of constant likelihood, which sit 
within each other like layers of an onion (there is however no need for them to be simply connected). The 
volume corresponding to each thin shell of likelihood is computed by the algorithm, allowing the integral 
for the evidence to be accumulated as shown in the graph. (From |26j ) 

of the remaining prior mass (as if the sampling is not uniform, the resulting Evidence value will be 
biased). CosmoNest achieved this by creating a single ellipsoid with contained all the remaining live 
points, and sampling uniformly from this shape. MultiNest advanced this to automatically generate 
multiple ellipsoids, to cover multiple likelihood peaks and increase the efficiency of the sampling process. 

One of the advantages of using nested sampling is that the samples, as well as contributing to the 
evidence integral, can also be used as posterior samples. Although they don't obey Markovian statistics, 
their posterior weight can easily be calculated from Bayes' theorem, using their likelihood and the prior 
mass associated with them. 

3.3 Population Monte Carlo 

Population Monte Carlo is a generic adaptive importance sampling method |28| . Importance sampling 
is the idea that you can generate your samples completely stochastically from some distribution. In 
importance sampling we draw samples from some proposal distribution q, such that more samples are 
drawn from regions which are judged to be important. The weights of each of the samples is given as 



For example, if there is some new data that will update constraints already in place from some previous 
experiment, you could simply recompute the new posterior (P) at points given by a previous MCMC 
chain (the values of q) , importance sampling using the chain as the proposal distribution. Obviously the 
distribution of points is unchanged, but the new likelihoods give the new weights (or importances) of the 
samples. 

Similarly to MCMC, we can use importance sampling to compute the evidence, since it will simply be 
the average of the weights. 



Wi = 



(8) 




(9) 



i=l 



Adaptive Importance Sampling (otherwise known as Population Monte Carlo) updates the proposal 
distribution with each iteration, attempting to move it closer to the distribution being sampled. Douc et al 
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Table 1: List of cosniological parameters and their approximate date of introduction into the standard 
cosmological model. 



Cosniological parameter 


Symbol 


Introduction date 


Alternate versions 


Hubble parameter 


Ho 


1929 


Distance to last scattering surface 


Matter density 


^111 


1924 


Physical matter density, = 0,n/i^ 


CMB Temperature 


?CMB 


1964 


Radiation density 0, 


Cold Dark Matter density 




(1933) 1975 




Galaxy Bias 


b 


1985 




Amplitude of density perturbations 


As 


1992 


CMB normalization AT/T, dispersion at 8/i~^Mpc, erg 


Optical depth to reionization 


T 


2003 


Redshift of reionization, z^^ian 


Spectral index of scalar perturbations 




2006 





[53] proposed to minimize the distance between the target distribution tt (our posterior) and the proposal 
distribution q as defined by the KuUback-Leibler divergence (the relative Shannon entropy between the 
two distributions [30]). This is defined to be 

DKL{p\\q) = I p{e)\og^de, (10) 

where p and q are two distributions, and a; is a set of variables that the distributions depend on. Rather 
than integrate over the parameter space, we can compute this function by simply summing it over the 
samples, i.e. 

^KL(7r||g)«f;P(&01og^. (11) 

Kilbinger et al. [21], following the approach of Cappe et al. [1H1I32]: applied Population Monte-Carlo to 
cosmology, using an implementation where the importance function q were modelled by a sum of mixtures 

D 

qx{x) = J2^dfYA^). (12) 

d=l 

where a = (ai, . . . ,a£)) is a vector of adaptable weights for the mixture components (with > and 
Sd=i ^d, = 1) and Yd a vector that specifies the components. Here / is a parameterized probability 
density function, usually taken to be a multi-variate Gaussian or Student-t distribution. In these cases, 
the updated weights and distribution parameters that minimise the KL divergence between the two 
distributions can be solved for analytically, details of which can be found in ref. [32| . 

Population Monte Carlo suffers from some of the problems faced by ordinary MCMC. For example, 
when the mixture distribution doesn't cover all the modes in a multi-modal distribution, it will fail to 
pick up contributions from these modes. 

One advantage of PMC is that it allows us to break somewhat from the 'sequential tyranny' of ordi- 
nary Markov Chain methods. Since the importance function is known, large numbers of samples can be 
generated from it in parallel^ making it ideally suited for use in Graphics Processing Unit (GPU) com- 
puting, since the CPUs can perform a very large number of parallel floating point operations. Of course, 
the update step still requires these samples to be brought together and analysed so that the importance 
function can be updated. Kilbinger et al. [31] state that PMC gives an unbiased estimator of the evidence 
at each iteration, and will converge on a high accuracy value in a very small number of such iterations. 

As with Nested Sampling, PMC produces samples of the posterior, and so can be used for generating 
model averaged posteriors. 

3.4 Transdimensional MCMC 

The biggest time requirement in using these previous methods for Bayesian Model Averaging is that the 
posteriors and evidences for each model need to be computed for each model individually. It could be 
more efficient to instead do all of this in one step, simultaneously sampling from the model space as 
well as the parameter values. For this we need an MCMC algorithm that can change dimensionality, 
while still maintaining reversibility to guarantee the chains elements satisfy detailed balance. Examples of 
transdimensional Monte Carlo include Reversible Jump MCMC birth-and-death process Monte-Carlo 
[31], and Continuous Time MCMC [35]. 



6 



Statistical Analysis and Data Mining 



Reversible Jump MCMC is an algorithm in which the dimensionality of the model ean be changed 
during the chain, by allowing transitions between models as well as parameter values. In order to maintain 
detailed balance, Green (1995) proposed a new acceptance probability. The general acceptance ratio from 
state X ~ {1, 6'^^^) to state X ~ (2, 6'^^-'), where 1 and 2 represent the model spaces that the two different 
states occupy, as 



a„i(X, X') 



(13) 



P(2,g(^)|j?)j-(2,g(^ 
'P(l,0(i)|Z?)j(l,e(i))<Zi(wW) 



)'?2(m 



(2)1 



ag(0(2)^y(2 



9(6'(i),u(i)) 



where P(()\U) is the usual posterior value, q is the probability density for parameter values inside that 
model, J is the probability for steps between models, and u is an extra random variable that controls the 
proposal of different models. The final term is a Jacobian of the function g{9,u) that converts parameter 



,(2) 



If these are independent, then the Jacobian is 



values between models, 
simply unity. 

The drawback of RJMCMC is that in practice, for the case of a small number of models, it isn't more 
efficient to sample simultaneously in both model and parameter space, since the steps between models 
often have a low acceptance rate. Instead, RJMCMC is more useful in cases where the dimensionality of 
the problem is large. For example, it has been applied in astrophysics to detection problems where the 
number of sources is unknown, such as in gravitational wave astronomy |36j . 



4 Applications in Astrophysics 
4.1 Cosmological Parameters 

Cosmological model development has proceeded incrementally, with a new parameter being added to the 
concordance cosmological model at a rate of approximately one a decade (though starting to increase 
faster in recent years: see Table [T] for an approximate history). Cosmological models rarely make specific 
predictions for the values of the parameters. For example, there is no theory that predicts exactly the 
value of the Hubble parameter, though you can infer a plausible range from other data, such as the age of 
the oldest stars in the Universe. The models that do predict values for the observables, for example models 
of inflation that predict specific values of rig (the spectral index) and r (the tensor-to-scalar amplitude 
ratio), do so by changing the free parameter to something more fundamental to the theory (e.g. the value 
of the mass scale or coupling in the inflationary potential) . 

Therefore, because there are no fundamental models that make very accurate predictions for the values 
of the cosmological parameters, they are phenomenological parameters. They can be measured accurately, 
but little or no fundamental physics can (yet) be derived from them. This isn't true for all cosmological 
parameters, and may not be always true for others. The optical depth to reionization, for example, could 
be accurately predicted if we had a perfect model for the formation of the first generation of stars and the 
reionization of the cosmic hydrogen. Alas, perfect models are found only in mathematics, and physicists 
have to live in the real Universe. 

The lack of a good theory to predict these parameters means that the relevant priors may be subject 
to a level of contention. 



4.1.1 Dark Energy 

The mysterious dark energy makes up around 70% of the energy density of the Universe, and acts against 
gravity to drive an accelerating expansion. Its discovery in the late 1990s was largely unexpected, and 
even today there are no very compelling models for where it comes from. The Dark Energy Task force 
report stated that "the nature of dark energy ranks among the very most compelling of all outstanding 
problems in physical science.^' |37| . 

The dynamical properties of the dark energy are normally parameterized in terms of the equation of 
state parameter w, where 

w=-. (14) 
P 

Here P is the pressure of the dark energy fiuid and p is the density. Different models of the dark energy 
make different predictions for the value of the equation of state. The cosmological constant model, where 
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Figure 2: Posterior parameter distributions for dark energy equation of state parameters wq (the value 
today) and Wa (the rate of change with scale factor). These distributions were obtained using the 
WMAP+SDSS(BAO)+SNLS data combination averaging over the quintessence-type models (I, II and 
V) alone. Some smoothing of the delta- functions has been carried out by binning. (From |38| ) 



the dark energy is the zero-point energy of the vacuum, predicts a value of w = — 1 only. Quintessence 
models, where canonical scalar fields are slowly rolling down some potential, can take time-dependent 
values in the range —1 < w < 1. 

Liddle et al. [55] addressed the problem of model independent constraints on the equation of state 
parameter w where the correct model is not known. They used five different models for the equation of 
state, which were: 

1. The cosmological constant model, where it; = —1 at all times, 

2. A constant equation of state model, where w can vary in the range —1 < w < —1/3, 

3. A constant equation of state model, where w can vary in the wider range —2 < w < —1/3, 

4. A time- varying equation of state, where the two parameters have prior range —2 < wq < — 1/3 and 
-4/3 <Wa< 4/3, 

5. A time varying equation of state, where the equation of state as a function of scale factor has prior 
range —1 < w{a) < 1. 

Models (ii) and (v) placed priors on the equation of state that excluded some of the high-likelihood region 
found in models (iii) and (iv), but these models were created to enforce the weak energy condition. In 
models (iv) and (v) the CPL parameterization [321 HO] was used, where 

w{a) = Wo + (1 - a)wa ■ (15) 

They also varied two parameters common between all five models, the present fractional energy density 
of matter fl^ and the value of the Hubble parameter today Hq. They assumed a flat universe where the 
density of the dark energy f^DE = 1 — ^m- 

In terms of data used to compute the evidence of each of the models and the posteriors of the param- 
eters, they used distance data from the then-current three-year Wilkinson Microwave Anisotropy Probe 
(WMAP) observations gllllS], the BAO measurement from the Sloan Digital Sky Survey (SDSS) LRG 
sample [33], and SN la data from the HST/GOODS programme [H] and the first-year Supernova Legacy 
Survey (SNLS [IS]), together with nearby SN la data. They computed the evidence and the posteriors 
simultaneously using nested sampling. 

The model selection results were inconclusive with no model favoured and only model (v) significantly 
disfavoured. Of course, owing to the nature of the different priors allowing different high likelihood regions, 
the posteriors for the five models were very different. The authors used Bayesian Model Averaging to plot 
the model-averaged posterior for wq and Wa, averaging over all five models and also those that enforced 
the weak energy condition (i, ii and v). The results are shown in Figure [U 
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Figure 3: Model-averaged posterior probability distribution for the curvature parameter, including all 9 
models considered in the analysis, assuming the Astronomers' prior (top panel) and the Curvature scale 
prior (bottom panel) for the curvature parameter. In the bottom panel, the solid line applies to closed 
universes (fife < 0), while the dotted line to open universes (fife < 0). The peaks represent the Dirac delta 
function encompassing the probability mass associated with flat models. (From ref. j46j ) 

4.1.2 Curvature 

In [46], the authors conducted a similar analysis to Liddlc et al. [38], which was described in section [4.1. 11 
using more up-to-date data and including models for the curvature of the universe simultaneously. They 
utilized data from the WMAP five-year release [17], once again using only the distance data from the 
CMB, the updated BAO measurements from [48], and the UNION08 SN-Ia data compilation [49]. They 
considered only three different models for the dark energy, which were: 

A: The cosmological constant model, where w = —1 at all times, 

W: A constant equation of state model, where w can vary in the range —2 < w < —1/3, and Wa — 0, 

WZ; A time- varying equation of state, where the two parameters have prior range —2 < wq < —1/3 and 
-4/3 <Wa< 4/3. 

They also considered three models of the curvature of the universe: flat, open, and closed. However, 
interestingly they considered these three models with two different priors, effectively measuring two differ- 
ent parameters. The Astronomers' prior considered flk as the parameter to be measured. The Curvature 
scale prior considered the curvature scale (the log of flk or -fife depending on whether the model was 
open or closed) as the parameter to be varied. They combined the three dark energy models and the three 
curvature models together (creating nine different models in total) for each of the two choices of prior. 

They used Metropolis-Hastings Markov Chain Monte Carlo to compute the posterior distributions 
and Bayes factors. They found that no model was favoured over the flat A model, but the closed or 
open models with the Astronomers' prior were significantly disfavoured compared to those same models 
using the Curvature scale prior. They also produced model-averaging results for f2fe, averaged over all 
nine models, but considering only either the Astronomers' prior or the Curvature scale prior. The model- 
averaged posteriors are shown in Figure [3] 

4.1.3 Primordial Power Spectrum 

The models for the generation of the primordial spectrum of fluctuations arc rather better motivated than 
those of the dark energy. Since the Big Bang model became established, many theories have been proposed 
for the formation of structure in the Universe and subsequently ruled out by observational evidence (e.g. 
active seed models such as cosmic strings). 
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Figure 4: The model-averaged posterior distributions for the spectral index rig, using the WMAP Syr 
data only (top), the WMAP 7yr data only (middle) and the WMAP Tyr+ext compilation (bottom). The 
probability distribution includes a delta function around rig = 1. artificially broadened in the plot by the 
binning process. (From ref. |53j ) 

The primordial power spectrum of scalar perturbations is normally modelled through a modified power- 
law function of wavenumber fc, 

A^ik) = A^iK) i^-J , (16) 

where the amplitude is defined at a pivot scale (fc,), Ug is the spectral index (also known as the tilt), and 
rirun is the running of the spectral index. 

Harrison and Zel'dovich [501 151] both suggested a maximally-symmetric model, with equal power on 
all scales. In such a model the spectral index is unity and the running is zero. Later, when Guth [5^ 
invented the concept of cosmological inflation, it was realized that such a period of accelerated expansion 
would generate a spectrum of density fluctuations with a spectral index close to unity and a negligible 
running. 

Parkinson & Liddle |53j applied Bayesian model averaging to measurements of the parameters of the 
primordial power spectrum of density fluctuations. They considered five different models. 

1. A scale-invariant Harrison-Zel'dovich (HZ) spectrum of scalar perturbations with no tensor compo- 
nent (rig = I, r = 0). 

2. A tilted model, where the spectral index is allowed to vary, still with no tensors. 
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3. A running model, where both the spectral index and the running of the spectrum (nrun) are allowed 
to vary. 

4. A tensor model, where the spectral index of the scalar perturbations and the tcnsor-to-scalar am- 
plitude ratio (r) are allowed to vary. 

5. A tensor+running model, where the spectral index, tensor-to-scalar ratio, and running all vary. 

They also varied the physical baryon density fthh^, the physical CDM density ilch^, the angular 
diameter of the sound horizon at last scattering 9, the optical depth to reionization r, the (log) amplitude 
of the density fluctuations, ln[10-'^°^s], and the amplitude of the Sunyaev-Zel'dovich contribution to the 
CMB power spectrum Asz- They used data from the five-year and seven-year WMAP data releases, 
to compare how much improvement in model selection occurred between them, and also combined the 
seven-year WMAP with smaller-scale CMB experiments and measurements of the galaxy power spectrum 
by the Sloan Digital Sky Survey (SDSS). 

The model-averaged posterior for Ug for the different data compilations are shown in Figure H) When 
WMAP data alone was used, most of the models had similar evidence values, and so the averaged posterior 
had a substantial contribution from the delta-function from the HZ model. When the extra small scale 
and galaxy power spectrum data were added, the models where Us was allowed to vary had larger evidence 
values, and these models also had posteriors for Ug that peaked for approximately the same values. 



4.1.4 Summary 

An interesting point to note is that in each of the above cases Bayesian model averaging was applied 
with a uniform model prior. That is, none of the authors felt that one model could be favoured a priori 
over the others. Because of the mysterious nature of these phenomena (dark energy, inflation) where any 
proposed physical mechanism is still highly speculative, the authors were forced to parameterize their 
ignorance with phenomenological descriptions. But since it is unknown even if these descriptions are 
accurately representing the real mechanism, no clear choice can be made in advance. This meant that 
the dimension of the models played the largest role in determining the evidence values, and so the model- 
averaged posteriors were dominated by the models with the fewest parameters. The exception to this was 
the primordial power spectrum case, where the simplest model (HZ) is close to being ruled out, and so 
its contributions to the model-averaged posterior are almost negligible. 



4.2 Cluster weak lensing and Sunyaev-Zel'dovich effect data 

The use of multiple models in a Bayesian framework is also useful in the case where the nature of the 
data is uncertain. One such case is galaxy clusters. Galaxy clusters are the largest gravitationally-bound 
objects in the Universe, and as such can be used as cosmological probes. However, the distribution of 
matter inside the cluster is crucial in recovering measurements of their properties, and often some model 
has to be assumed before probabilistic inferences on other cosmological parameters can be made. 

Marshall et al. [12] applied Bayesian methods to the problem of cluster data modelling, attempting 
to fit the best model to the clusters' gas and temperature profiles using simulated Sunyaev-Zel'dovich 
effect data and weak gravitational lensing data. They considered two models of the gas density profile, 
the 'Beta' model (see e.g. ref. [54|), and the full hydrostatic pressure equilibrium or 'iHSE' model. Both 
models assume a one-parameter isothermal temperature profile, and the only difference comes in the gas 
density distribution. 

The Beta model has gas density profile 

Pga.(r) ^'^^^^"^ 3, , (17) 

[1 + {r/r,)^f^/^ 

where Pgas(O) is a constant relating to the density at the centre of the cluster, Tc is the characteristic scale 
of the gas density profile, and /3 is a parameter that gives the shape of the profile. This model is often 
used in cluster modelling. 

In the iHSE model the gas is in full hydrostatic pressure equilibrium, with a potential defined by the 
NFW profile [SS]. Solving for this potential, the gas profile is found to be 



Pgas(0)exp 



AirGpsr'g^fi log(l + r/r^) 



kT \ r/rg 



(18) 
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Figure 5: Marginalized probability distributions of cluster cosmological parameters from mock cur- 
rent (top) and future (bottom) experiments. The true cluster model is iHSE. P(/gas/i|i?, Beta) and 
P{fgash\D , mSFi) are plotted with full and dashed lines respectively; the dotted curve shows the result of 
model averaging, P(/gasft.|£')- In the next-generation case, the model-averaged curve is indistinguishable 
from the true model curve, as the evidence ratio between the two models is large. (From ref. [12) ) 

where G is Newton's gravitational constant, /x is the mean mass per particle (here set to be 0.59), k 
is the Boltzmann constant, and T is the temperature. Finally ri, is the characteristic scale of the dark 
matter halo, as defined in the NFW profile. Note that the characteristic scale of the gas density in the 
Beta model (rg) and the characteristic scale of the dark matter density in the iHSE model (r^) are very 
different parameters and cannot really be directly compared. 

They generated simulated data assuming contemporary facilities: observations of low-redshift clusters 
in the optical with MegaCam at CFHT, and at 30 GHz with the extended VSA [5B]. They then went on 
to consider next-generation experiments, with mock lensing observations on the ESO Wide Field Imager 
and mock SZ data from AMI [ST]. They used MCMC methods to fit the SZ data for Mgas and the weak 
gravitational lensing data for M200, assuming uniform priors on both of these parameters. The ratio of 
these two gives the gas fraction of the cluster fgash- 

The posterior distributions for the gas fraction parameter are shown in Figure [SJ Using the current 
data, the evidence values of the two models were equal within the numerical errors, and so the model- 
averaged posterior is simply the average of the two posteriors. In contrast, when using next-generation 
data the evidence of the true model was much larger than that the incorrect model (though the precise 
numerical values depended on what was assumed about the primordial CMB contamination) , and so the 
model-averaged posterior was indistinguishable from the true model posterior. 

4.3 Distance measurements 

The measurement of distances is another area in astrophysics where Bayesian methods are useful. Cepheid 
variable stars are established as distance indicators, but calibrating the Barnes-Evans Cepheid surface 
brightness relation [55] of these objects is still a challenge. Only relatively recently have interferomet- 
ric measurements of their angular diameter allowed them to be calibrated geometrically, almost fully 
independent of all other astronomical distance scales [59] . 

Barnes et al. (SO] applied Bayesian methods to this problem. They created an empirical model for the 
variation in magnitude with time (known as the light curve) and radial velocity with time, fitting them 
to Fourier polynomials of unknown order (here M is the order of the velocity model and N is the order 
of the magnitude model). Rather than fix the values of M and N , they used Reversible Jump MCMC 
and let them vary. Each polynomial has amplitude and phase, and there is also a constant term, so the 
total number of parameters is 2M + 1 for the velocity data and 2N + 1 for the magnitude data. They also 
fit for the mean angular diameter, some hyperparameters relating to accuracy, an unknown phase shift 
between the magnitude and velocity variation and (most importantly) the distance. They put a uniform 
prior on the order of the Fourier polynomials, with a cut-off value. 

They used an ensemble of thirteen Cepheids selected due to their high-quality photometry and avail- 
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ability of radial velocity data. They found for most of the Ccpheids that more than one order had 
significant posterior probability (i.e. more than one value of M or iV had significant evidence associated 
with it). They compared their results to the traditional method, where the order was selected 'by eye', 
usually with the criterion that the polynomial is terminated when the scatter in the polynomial approaches 
the uncertainty in the data. They found excellent agreement between the results from their method and 
those of the previous more traditional methodology. 

4.4 Star Classification 

The classification of objects has been a common application of Bayesian methods in astrophysics, where 
a model for the type of object can be selected using the evidence (for example see rcfs. [M] and p^*). 
Since the variables involved in the models are often very different, a model-averaging procedure does 
not necessarily lend itself to such problems. However, there are cases where the outputs of the different 
classification models can be combined in such a manner. 

In ref. [53] the authors were interested in the fast classification of variable stars using neural networks. 
The intention is that this machine-learning approach can be applied to the large numbers of objects that 
will be detected by the CoRoT, Kepler, and Gaia satellite missions in an automated fashion. They use a 
multi-layer perceptron, but instead of using the maximum likelihood estimate from a single network, they 
use Bayesian Model Averaging to combine the output from a number of different networks, effectively 
using the networks as predictive models. They tested the Bayesian Averaging Artificial Neural Network 
approach against two other approaches, the fc-dependent Bayesian Classifier [M] and the Support Vector 
Machines classifier [5S]. They found that none of the methods particularly outperformed the others, all 
giving rather disappointing misclassification rates. The Bayesian Averaging Artificial Neural Network 
proved to be the exception in one case, that of the eclipsing binaries, achieving a correct classification rate 
of 99.73%. 

This use of Bayesian Model Averaging is very interesting, as it illustrates how it can be applied in 
a situation where the underlying physical model is not known or well understood. In this case there is 
almost no physical model; the Artificial Neural Networks predict the brightness of the transients through 
empirical learning. Hence there is no prior belief about which model must be better than the other. 

5 Conclusions 

In this review we have summarised the application of Bayesian Model Averaging in a number of areas of 
astrophysics, namely cosmological model averaging for parameters associated with inflation and the dark 
energy, measurements of the baryonic gas fraction in galaxy clusters, calibrating accurate distances from 
Cepheid variable stars, and automated star classification using neural networks. We found that these 
applications fell broadly into one of two types, with model averaging being applied either in the case 
where the underlying physical model was uncertain, or the situation where some details about the data 
acquisition or modelling was uncertain. In all of these cases the model likelihood (evidence) was not able 
to discriminate between the different models, and so they all made some contribution to the averaged 
posterior. 

There are still a number of research areas in astrophysics where Bayesian model averaging has not been 
implemented, even though Bayesian methods and model selection have been adopted. It is common for 
data to be unable to select a single preferred model; quoting parameter uncertainties based on selecting one 
particular model may then misrepresent the actual uncertainty, and can be directly addressed by Bayesian 
Model Averaging. Note that including model uncertainty does not necessarily increase the uncertainty 
on model parameters, e.g. tighter constraints are obtained on the dark energy equation of state if one 
includes the cosmological constant model, given its high evidence and fixed value of w = — 1. 

Areas we feel would benefit from the model-averaging approach include questions where the parameters 
of some signal have to be estimated in the presence of some unknown number of contaminating components. 
This would include, for example, gravitational wave astronomy and sub-millimetre and radio imaging 
surveys. Then there are cases where some data products are estimated with some model assumed, even 
though the evidence does not decisively select that model against other candidates, for instance orbital 
properties of extrasolar planets in systems where additional planets are marginally detected. 

In Bayesian model averaging (as with any Bayesian method) the results have some dependence on the 
choice of model priors, especially as in the cases were model averaging is useful the model likelihoods will 
be similar. This should not deter the reader from applying such methods, as they simply require them 
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to consider more deeply which models they consider more reasonable or likely before examining the data. 
Ultimately model averaging encodes and combines uncertainty at both the model and parameter level, 
and so provides a more complete description of the current state of knowledge. 



Acknowledgments 

We thank the referees for their helpful comments and suggestions. D.P. was supported by the Australian 
Research Council through a Discovery Project grant, and A.R.L. by the Science and Technology Facilities 
Council [grant number ST/1000976/1]. A.R.L. thanks the Institute for Astronomy, University of Hawai'i, 
for hospitality while this work was carried out. 



References 

[1] D. J. C. MacKay, Information theory, inference, and learning algorithms, Cambridge University Press, 
Cambridge, 2003. 

[2] P. Gregory, Bayesian logical data analysis for the physical sciences, Cambridge University Press, Cam- 
bridge, 2005. 

[3] D. Sivia, J. Skilling, Data Analysis: A Bayesian Tutorial, 2nd edition, Oxford University Press, Oxford, 
2006. 

M. P. Hobson, A. H. Jaffe, A. R. Liddle, P. Mukherjee, D. Parkinson, Bayesian methods in Cosmology, 
Cambridge University Press, Cambridge, 2010. 

[5] R. E. Kass and A. E. Raftery, Bayes Factors, J. Am. Stat. Assoc., 90 (1995), 773-95. 

[61 H. Jeffreys, Theory of Probability (3rd ed.), Oxford: Oxford University Press, Oxford, 1961. 

R. Trotta, Applications of Bayesian model selection to cosmologieal parameters, Mon. Not. Roy. As- 
tron. Soc. 378 (2007) 72. arXiv:astro-ph/0504022 , 

A. Jaffe, Ho and Odds on Cosmology, Astrophys. J. 471 (1996) 24. |arXiv:astro-ph/9501070 | 

[9] P.S. Drell, T. J. Loredo, L Wasserman, Type la Supernovae, Evolution, and the Cosmologieal Constant, 
Astrophys. J., 530 (2000), 593. arXiv:astro-ph/9905027, 

[10] M. V. John and J. V. Narlikar, Comparison of cosmologieal models using Bayesian theory, Phys. Rev. 



D, 65 (2002), 043506. |arXiv:astro-ph/0111122 



[11] A. Slosar et al., Cosmologieal parameter esti mation and Bayesian mod el comparison using VSA data, 
Mon. Not. Roy. Astron. Soc, 341 (2003), L29. |arXiv:astro-ph/0212497( 

[12] P. J. Marshall, M. P. Hobson, A. Slosar, Bayesian joint analysis of cluster weak len sing and sunyaev- 
zel'dovich effect data, Mon. Not. Roy. Astron. Soc, 346 (2003), 489. |arXiv:astro-ph/0307098, 

[13] T. D. Saini, J. Weller, S. L. Bridle, Revealing the Nature of Dark Energy Using Bayesian Evidence, 
Mon. Not. Roy. Astron. Soc, 348 (2004), 603. a rXiv:astro-ph/030552 6. 

[14] A. Niarchou, A. H. Jaffe, L. Pogosian, Large-scale power in the CMB and new physics: an analysis 



using Bayesian model comparison, Phys. Rev. D, 69 (2004), 063515. arXiv:astro-ph/0308461 



[15] B. A. Bassett, P. S. Corasaniti, M. Kunz, The essence of quintessence and the cost of compression. 



Astrophys. J. Letts., 617 (2004), LI. arXiv:astro-ph/0407364 



[16] A. R. Liddle, P. Mukherjee, D. Parkinson, Cosmologieal Model Selection, Astronomy & Geophysics, 
47 (2006), 4.30. |arXiv:astro-ph/0608184| 

[17] R. Trotta, Bayes in the sky: Bayesian inference and model selection in cosmology, Contemp. Phys., 
49 (2008), 71. larXiv:0803.4089 [astro-ph]. 



14 



Statistical Analysis and Data Mining 



[18] A. R. Liddlc, Statistical methods for cosmological parameter selection and estimation, Ann. Rev. 
Nucl. Part. Sci., 59 (2009), 95. .arXiv:0903.4210l [astro-ph] . 

[19] J. A. Hoeting, D. Madigan, A. E. Raftery, C. T. Volinsky, Bayesian Model Averaging: A Tutorial, 
Statistical Sciences, 14.4 (1999), 382. Available at www.stat.washington.edu/www/research/online/ 
hoetingl999.pdf. 

[20] M H. Chen, Q-M. Shao, J. G. Ibrahim, Monte Carlo Methods in Bayesian Computation, Springer, 
New York, 2000. 

[21] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, Equation of State 
Calculations by Fast Computing Machines, Journal of Chemical Physics, 21 (1953), 1087. 

[22] W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, 
Biometrika, 57 (1970), 97. 

[23] A. Lewis and S. Bridle, Cosmological parameters from CMB and other data: A Monte Carlo approach, 



Phys. Rev. D, 66 (2002), 103511. arXiv:astro-ph/0205436 



[24] J. Skilling, Nested Sampling, in Bayesian Inference and Maximum Entropy Methods in Science and 
Engineering, ed. R. Fischer et al., Amer. Inst. Phys., conf. proc. Springer, New York, 735 (2004), 395. 
Available at http:/ / www . i nference . phy. cam. ac . uk/bayesys /^ 

[25] J. Skilling, Nested sampling for general Bayesian computation, Bayesian Anal., 1 (2006), 833. 

[26] P. Mukherjee, D. Parkinson, A. R. Liddle, A Nested Sam pling Algorithm for Cosmological Model 
Selection, Astrophys. J. Letts., 638 (2006), L51. |arXiv:a,stro-ph/0508461i^2. 

[27] F. Feroz, M. P. Hobson, M. Bridges, MultiNest: an efficient and robust Bayesian inference tool 
for cosmology and particle physics, Mon. Not. Roy. Astron. Soc, 398 (2009), 1601. (arXiv:0809.3437l 
[astro-ph] . 

[28] O. Cappe, A. GuiUin, J.-M. Marin, C. P. Robert, Population Monte Carlo, J. Comp. Graph. Stats., 
13 (2004), 907. 

[29] R. Douc, A. GuiUin, J.-M. Marin, C. P. Robert, Convergence of adaptive mixtures of importance 
sampling schemes, Ann. Statist., 35 (2007), 420. larXiv:0708.0711l [math.ST]. 

[30] S. Kullback and R. A. Leibler, On information and sufficiency. Annals of Mathematical Statistics, 22 
(1951), 79. 

[31] M. Kilbinger, et al., Bayesian model comparison in cosmology with Population Monte Carlo, Mon. 
Not. Roy. Astron. Soc, 405 (2010), 2381. arXiv:0912.1614 [astro-ph]. 

[32] O. Cappe, R. Douc, A. Guillin, J.-M. Marin, C. P. Robert, Adaptive importance sampling in general 
mixture classes, Stat. Comput., 18:4 (2008), 447. 

[33] P. J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determi- 
nation, Biometrika, 82 (1995), 711. 

[34] M. Stephens, Bayesian analysis of mixture models with an unknown number of componentsan alter- 
native to reversible jump methods, Ann. Statist., 28 (2000), 4074. 

[35] O. Cappe, C. P. Robert , T. Rydcn, Reversible jump, birth-and-death and more general continuous 
time Markov chain Monte Carlo samplers, J. Roy. Statstic. Soc. B, 65 (2003), 3, 679 

[36] N. J. Cornish, T. B. Littenberg, Tests of Bayesian Model Selection Techniques for Gravitational Wave 
Astronomy, Phys. Rev. D, 76 (2007), 083006. larXiv:0704. 18081 [gr-gc] 

[37] A. Albrecht, et al.. Report of the Dark Energy Task Force, 2006. Available: arXiv.org/abs/astro- 
ph/0609591. 

[38] A. R. Liddle, P. Mukherjee, D. Parkinson, Y. Wang, Present and future evidence for evolving dark 
energy, Phys. Rev. D, 74 (2006), 123506. ,arXiv:astro-ph/0610126, 



Parkinson and Liddle: Bayesian Model Averaging in Astrophysics 



15 



[3 9] E. V. Linder , Explori ng the Expansion History of the Universe, Phys. Rev. Lett., 90 (2003), 091301. 



arXiv:astro-ph/0208512 



[40] M. Chevahier and D. Polarski, Accelerating Universes with Scahng Dark Matter, Int. J. Mod. Phys. 



D, 10 (2001), 213. arXiv:gr-qc/0009Q08 



[41] G. Hinshaw, et ah [WMAP collaboration]. Three- Year Wilkinson Microwave Anisotropy 
Probe ( WMAP) Obser vations: Temperature Analysis, Astrophys. J. Supp., 170 (2007), 288. 
|arXiv:astro-ph/060345Tl 



[42] D. N. Spergel, et al. [WMAP collaboration], Wilkinson Microwave Anisotropy Probe (WMAP) 
Three Year Results: Implications for Cosmology, Astrophys. J. Supp., 170 (2007), 377. 
larXiv:astro-ph/0603449 , 



[43] M. Tegmark, et al., Cosmological Constra ints from the SDSS Luminous Red Galaxies, Phys. Rev. D, 



74 (2006), 123507. arXiv:astro-ph/0608632 



[44] A. G. Riess, et al.. Type la Supernova Discoveries at z > 1 From the Hubble Space Telescope: 
Evidence for Past Deceleration and Constraints on Dark Energy Evolution, Astrophys. J., 607 (2004), 
665. arXiv:astro-ph/0402512, 

[45] P. Astier, et al., "The Supernova Legacy Survey: Measurement of 57m, ^^a and uu from the First Year 



Data Set", Astron. & Astrophys., 447 (2006), 31. arXiv:astro-ph/0510447 



[46] M. Vardanyan, R. Trotta, J. Silk, How flat can you get? A model comparison perspective on the 
curvature of the Universe, Mon. Not. Roy. Astron. Soc, 413 (2011), L91. larXiv:0901. 33541 [astro-ph]. 

[47] J. Dunkley, et al., [WMAP collaboration]. Five- Year Wilkinson Microwave Anisotropy Probe 
(WMAP) Observations: Likelihoods and Parameters from the WMAP data, Astrophys. J. Supp., 180 
(2009), 306. arXiv:0803.0586 [astro-ph]. 

[48] W. J. Percival, S. Cole, D. J. Eisenstein, R. C. Nichol, J. A. Peacock, A. C. Pope, A. S. Szalay, 
"Measuring the Baryon Acoustic Oscillation scale using the SDSS and 2dFGRS, Mon. Not. Roy. Astron. 
Soc., 381 (2007), 1053. arXiv:0705.3323 [astro-ph]. 

[49] M. Kowalski, et al.. Improved Cosmological Constraints from New, Old and Combined Supernova 
Datasets, Astrophys. J., 686 (2008), 749. larXiv:0804.4142l [astro-ph] . 

[50] R. Harrison, Fluctuations at the threshold of classical cosmology, Phys. Rev. D, 1 (1970), 2726. 

[51] Ya. B. Zel'dovich, A Hypothesis, Unifying the Structure and the Entropy of the Universe, Mon. Not. 
Roy. Astron. Soc., 160 (1972), Ip. 

[52] A. H. Guth, Inflationary universe: A possible solution to the horizon and flatness problems, Phys. 
Rev. D, 23 (1981), 347. 

[53] D. Parkinson, A. R. Liddle, Application of Bayesian model averaging to measurements of the pri- 
mordial power spectrum, Phys. Rev. D, 82 (2010), 103533. arXiv: 1009. 1394 [astro-ph]. 

[54] C. Sarazin, X-ray Emission from Clusters of Galaxies, Cambridge University Press, Cambridge, 1988. 

[55] J. F. Navarro, C. S. Frenk, S. D. M. White, A Universal density profile from hierarchical clustering. 



Mon. Not. Roy. Astron. Soc, 275 (1995), 720. arXiv:astro-ph/9408069 



[56] K. Lancaster, et al.. Very Small Array Observations of the Sunyaev-Zel'dovich effect in nearby galaxy 



clusters, Mon. Not. Roy. Astron. Soc, 359 (2005), 16. |arXiv:astro-ph/0405582 



[57] J. Zwart, et al.. The Arcminute Mierokelvin Imager, Mon. Not. Roy. Astron. Soc, 391 (2008), 1545. 

[58] T. G. Barnes, D. S. Evans, Stellar angular diameters and visual surface brightness - I. Late spectral 
types, Mon. Not. Roy. Astron. Soc, 174 (1976), 489. 

[59] T. E. Nordgren, B. F. Lane, R. B. Hindsley, P. Kervella, Calibration of the Barnes-Evans relation 
using interferometric observations of Cepheids, AJ, 123 (2002), 3380. ,arXiv:astro-pli/0203130, 



16 



Statistical Analysis and Data Mining 



[60] T. G. Barnes, W. H. Jcffcrys, J. O. Bcrgcr , P. J. Mueller, K. Orr, R. Rodriguez, A Bayesian analysis 
of the ccpheid distance scale, Astropliys. J., 592 (2003), 539; Erratum-ibid. Astrophys. J., 611 (2004), 
621. arXiv: a stro-pli/0303656,. 

[61] M. P. Hobson, C. McLachlan, A Bayesian approach to discrete object detection in astronomical 
datasets, Mon. Not. Roy. Astron. Soc., 338 (2003), 765. |arXiv:astro-ph/0204457 

[62] R. Savage and S. Oliver S., Bayesian methods of astronomical source extraction, Mon. Not. Roy. 
Astron. Soc., 611 (2007), 1339. |arXiv:astro-ph/0512'597| 

[63] J. Debosscher, L. M. Sarro, C. Aerts, J. Cuypers, B. Vandenbussche, R. Garrido, E. Solano, Au- 
tomated supervised classification of variable stars I. Methodology, Astron. & Astrophys., 475 (2007), 
1159.1ar2£nQQ7ir0703 [astro-pli]. 

[64] M. Sahami , Learning limited dependence Bayesian classifiers, in Second International Conference on 
Knowledge Discovery and Data Mining (Menlo Park, California: AAAI Press) (1996), 335 

[65] S. R. Gunn, M. Brown, K. M. Bossley, Network performance assessment for ncurofuzzy data mod- 
elling, 1997, in IDA 97: Proceedings of the Second International Symposium on Advances in Intelligent 
Data Analysis, Reasoning about Data (London, UK: Springer- Verlag) , 313. 



